Fast PageRank Computation via a Sparse Linear System

نویسندگان

  • Gianna M. Del Corso
  • Antonio Gulli
  • Francesco Romani
چکیده

Recently, the research community has devoted increased attention to reducing the computational time needed by web ranking algorithms. In particular, many techniques have been proposed to speed up the well-known PageRank algorithm used by Google. This interest is motivated by two dominant factors: (1) the web graph has huge dimensions and is subject to dramatic updates in terms of nodes and links, therefore the PageRank assignment tends to became obsolete very soon; (2) many PageRank vectors need to be computed according to different choices of the personalization vectors or when adopting strategies of collusion detection. In this paper, we show how the PageRank computation in the original random surfer model can be transformed in the problem of computing the solution of a sparse linear system. The sparsity of the obtained linear system makes it possible to exploit the effectiveness of the Markov chain index reordering to speed up the PageRank computation. In particular, we rearrange the system matrix according to several permutations, and we apply different scalar and block iterative methods to solve smaller linear systems. We tested our approaches on web graphs crawled from the net. The largest one contains about 24 millions nodes and more than 100 million links. Upon this web graph, the cost for computing the PageRank is reduced by 65% in terms of Mflops and by 92% in terms of time respect to the power method commonly used.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Euler-Richardson method preconditioned by weakly stochastic matrix algebras: a potential contribution to Pagerank computation

Let S be a column stochastic matrix with at least one full row. Then S describes a Pagerank-like random walk since the computation of the Perron vector x of S can be tackled by solving a suitable M-matrix linear system Mx = y, where M = I − τA, A is a column stochastic matrix and τ is a positive coefficient smaller than one. The Pagerank centrality index on graphs is a relevant example where th...

متن کامل

Fast PageRank Computation Via a Sparse Linear System (Extended Abstract)

The research community has devoted an increased attention to reduce the computation time needed by Web ranking algorithms. Many efforts have been devoted to improve PageRank [4, 23], the well known ranking algorithm used by Google. The core of PageRank exploits an iterative weight assignment of ranks to the Web pages, until a fixed point is reached. This fixed point turns out to be the (dominan...

متن کامل

Fast ranking algorithm for very large data

In this paper, we propose a new ranking method inspired from previous results on the diffusion approach to solve linear equation. We describe new mathematical equations corresponding to this method and show through experimental results the potential computational gain. This ranking method is also compared to the well known PageRank model. Keywords-Large sparse matrix, Iteration, Fixed point, Pa...

متن کامل

Approximation of Largest Eigenpairs of Matrices and Applications to Pagerank Computation

In this work, we propose di erent approaches, for the treatment of the following problems: (i) computation of the largest eigenvalue of a matrix and the corresponding eigenvector when neither is known, (ii) computation of the eigenvector of a matrix corresponding to its largest eigenvalue when this eigenvalue is known. The matrix is arbitrary, large, and sparse. We treat the rst problem by Kryl...

متن کامل

Approximating Personalized PageRank with Minimal Use of Web Graph Data

In this paper, we consider the problem of calculating fast and accurate approximations to the personalized PageRank score ([8, 16]) of a webpage. We focus on techniques to improve speed by limiting the amount of webgraph data we need to access. PageRank scores are mainly used for ranking purposes, and generally only the scores exceeding a given threshold are relevant. In practice, and relative ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Internet Mathematics

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2005